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Abstract 


rfc  be  Weibull  populations  with  a  common  known  shape  parameter,  and 
with  unknown  scale  parameters.  The  goal  is  to  find  the  population  with  the  largest  scale 
parameter.  From  each  population,  Type  II-censored  observations  are  available  at  two 
stages,  where  censoring  at  stage  1  (2)  "occurs  at  the  y-th  (r-th)  failure.  Two-stage  tio- 
cedures  with  screening  at  the  first  stage  are  considered  which  are  optimum  permutation 
invariant  in  terms  of  the  risk  with  respect  to  a  large  class  of  loss  functions.  For  the  proce¬ 
dure  with  a  fixed  subset  size  at  stage  1,  the  least  favorable  parameter  configuration  under 
the  indifference  zone  approach  is  of  the  slippage  type,  which  makes  it  feasible  to  control 
the  infimum  of  the  probability  of  a  correct  selection.  Some  extensions  of  the  results  are 
discussed  at  the  end. 
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1.  Introduction. 

Suppose  we  wish  to  find  the  most  reliable  of  k  types  of  components  iri,...,5r*,  say. 
Let  the  failure  times  follow  a  Weibull  model  with  density  w.r.t.  the  Lebesgue  measure, 
confined  to  the  positive  real  line, 


(1) 


f(x\0i)  =  a0ixa  1exp(—ffiXa),  x  >  0, 


where  a  >  0  is  fixed  known,  and  0~l*a  >  0  is  the  unknown  scale  parameter  associated 
with  7 r,,  i  —  1, . . . ,  k.  Thus  our  goal  becomes  to  find  that  one  7T*  which  is  associated  with 
the  smallest  0,,  »  =  1  ,...,k.  We  may  assume  that  it  is  unique,  to  keep  the  presentation 
of  our  material  simple.  This  is  not  a  serious  restriction,  since  in  case  of  ties,  we  may  be 
content  with  the  selection  of  any  of  the  most  reliable  components,  and  our  results  hold  in 
this  case  analogously. 

We  shall  consider  2-stage  selection  procedures,  with  screening  at  the  first  stage,  as 
discussed  in  Miescke  (1984),  which  are  based  on  Type  Il-censored  failure  times  observed  at 
both  stages.  In  the  following,  all  failure  times  considered  are  assumed  to  be  independent. 
To  describe  the  sampling  process  from  the  k  populations  7rlt . . . ,  irk,  it  is  sufficient  to  do 
so  for  one  particular  population  7rt,  say. 

At  stage  1,  >0  components  of  type  i r,  are  tested  simultaneously  until  the  </-th 

failure  occurs.  If  this  type  of  component  is  not  screened  out  at  stage  1,  m,  >  0  components 
are  added,  or  -m,  >  0  components  are  withdrawn,  and  then  the  —  q  +  mi  components 
are  further  tested  simultaneously  until  the  r-th  failure  at  this  stage  occurs.  Hereby  we 
assume,  by  obvious  reasons,  that  q  >  1,  r  >  1,  >  q,  and  n,  +  m,  >  q  +  r.  It  should 
be  pointed  out  that  both  q  and  r  do  not  depend  on  i.  We  shall  see  later  that  this  is  very 
crucial  to  gain  permutation  invariance  in  our  decision  problem,  which  in  turn  provides  the 
basis  for  finding  optimum  permutation  invariant  decision  procedures. 

The  next  step  is  a  reduction  by  sufficiency.  Because  of  the  assumed  independence 
of  all  failure  times,  this  can  be  done  as  well  for  one  particular  population,  rr,,  say.  Let 
Ui'i,  Uit 2, . . . ,  Uitq  denote  the  ordered  failure  times  to  failure  q  of  tt,-  at  stage  1.  Likewise, 
let  Vi'uVij, . . . ,  Vi>r  denote  the  ordered  failure  times  to  failure  r  of  7r,  at  stage  2,  measured 
from  Ui<q  onwards.  Let 


(2) 


r 


Vi  =  v°,i  +  (n*  +  m«  -  q  -  r)Vi?r 

i= i 

Ti  =  Ui  +  Vi  . 


and 


Then  the  distributional  properties  of  Ui,  Vi,  and  Ti  can  be  summarized  as  follows. 
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Lemma  1.  For  every  population  7ry,  the  following  holds. 

(a)  Ui(Vi)  is  sufficient  for  0,  at  stage  1  (2),  and  7\ 
is  sufficient  for  9{  at  both  stages  combined. 

(b)  Ui  and  Vt  are  independent. 

(c)  2$iUi(26iVi, 2d,Ti)  is  chi-squared 

distributed  with  2q(2r,2q  +  2 r)  degrees  of  freedom 

Proof:  Most  of  these  facts  are  well  known.  Thus  we  outline  the  proof  only  briefly.  The 
statement  concerning  U,  in  (a)  follows  from  looking  at  the  likelihood  function  at  stage  1, 
as  it  is  done  in  Tsokos  and  Rao  (1979).  The  statement  concerning  {/,  in  (c)  is  proved 
in  Gnedenko  et  al.  (1969),  sec.  3.3,  for  the  case  of  a  =  1,  and  it  can  be  extended  to 
the  case  of  any  other  value  of  a  immediately.  Finally,  by  considering,  instead  of  the 
original  failure  times,  the  a-th  powers  of  the  same,  which  are  exponentially  distributed 
with  scale  parameter  0~l,  the  proof  can  be  completed  by  using  the  lack  of  memory  of  the 
exponential  law  or,  more  precisely,  the  strong  Markov  property  inherent  in  the  sampling 
process  described  in  terms  of  these  exponential  random  variables,  as  it  is  discussed  in  Feller 
(1971),  section  1.6. 

Returning  to  the  joint  consideration  of  our  k  populations,  all  facts  stated  so  far  can  be 
carried  over  in  a  natural  way,  since  (£/,,  Vi,  Tt),  t  =  1, . . . ,  fc,  are  independent  random  vec¬ 
tors.  For  notational  convenience,  let  in  the  following  U_  =  (C/j , . . . ,  Uk),  Y_=  (V\ , . . . ,  V*), 
and  T  =  (Tj, . . .  ,7*).  Next  we  introduce  a  class  of  2-stage  selection  procedures  for  the 
given  decision  problem. 

Definition.  A  2-stage  selection  procedure  acts  as  follows.  After  all  observations  at  stage  1 
have  been  made,  a  non-empty  subset  s  C  {1  , ...,fc}  is  selected.  All  population  7r,  with 
i  &  s  are  discarded.  If  s  =  {j},  say,  the  final  decision  “xy  is  the  most  reliable  type”  is 
made.  If  s  contains  more  than  one  element,  stage  2  is  entered.  The  sampling  process 
is  continued  for  all  7r,  with  i  €  s  as  described  before,  and  then  for  some  j  €  s  the  final 
decision  a7r3  is  the  most  reliable  type”  is  made. 

Let  L(0,  (s,t))  be  a  real-valued  loss  which  occurs  at  0  =  ($i  ,...,0fc),  if  s  is  selected 
at  stage  1,  and  the  final  decision  is  in  favor  of  JT»,  t  €  s  C  {1, . . . ,  fc}.  We  assume  that  it 
is  integrable  such  that  the  associated  risk  function  exists.  Moreover,  let  it  be  permutation 
invariant  as  defined  in  Gupta  and  Miescke  (1984),  and  let  it  favor  the  selection  of  more 
reliable  components  in  the  following  way: 

(3)  L(0,(s,i))  <  L(0,(s,j))  if 

(a)  i,j  €s  =  S  8{  <  Oj,  or 

(b)  s\{t'}  =  s\{j},  0i  <  Oj ,  or 

(c)  «'  =  j,  s\{u}  =  s\{v},  9U  <  9V. 

It  can  be  seen  that  under  such  a  loss  function,  the  decision  problem  is  invariant  under 
the  group  of  permutations.  This  justifies  restricting  our  further  considerations  to  2-stage 
selection  procedures  which  are  permutation  invariant.  A  rigorous  definition  of  this  class 
can  be  found  in  Gupta  and  Miescke  (1984).  The  optimum  rules  within  this  class  are,  as 
we  shall  see  later,  of  the  following  form. 
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Definition:  Let  J?:(0, oo)fc  — »  {1,2,...,/:}  be  a  symmetric,  Borel-measurable  function. 
Let  us  consider  R{U)  as  a  decision  rule  to  determine  the  size  of  the  subset  to  be  selected 
at  stage  1.  Then  P{R)  is  the  2-stage  selection  procedure  which  selects  at  stage  1  in  terms 
of  the  J2(f/)-largest  U,'s,  and  makes  the  final  decision  at  stage  2  in  terms  of  the  largest 
observed  T,. 

Optimality  of  P{R)  among  all  procedures  which  employ  the  same  subset-size  rule 
R  will  be  shown  in  Section  2.  Further  properties  of  P(R)  will  also  be  discussed  there. 
Of  special  interest  is  the  case  of  a  constant  R,  R  =  t,  say.  This  will  be  considered 
in  Section  3  under  the  0-1  loss  function,  which  is  zero  if  and  only  if  the  most  reliable 
type  of  component  is  finally  selected.  The  risk,  or  probability  of  an  incorrect  selection, 
respectively,  of  P(t)  will  be  shown  to  have  a  natural  least  favorable  parameter  configuration 
in  the  indifference  zone  approach  of  Bechhofer  (1954),  which  makes  it  feasible  to  control 
the  infimum  of  the  probability  of  a  correct  selection,  denoted  by  Pe(CS\P(t)),  on  a  certain 
subset  of  the  parameter  space.  Finally,  in  Section  4,  extensions  of  our  results  to  p-stage 
selection  procedures  will  be  described,  and  some  open  questions  for  further  research  will 
be  presented. 

2.  Optimality  of  the  Procedures  P(R). 

The  first  of  our  results  establishes  optimality  of  procedure  P(R)  within  the  class  of 
all  permutation  invariant  2-stage  selection  procedures  which  employ  the  same  subset  size 
rule  R. 

Theorem  1.  Let  R  be  a  subset-size  rule,  and  let  L  be  a  loss  function,  with  properties  as 
described  in  Section  1.  Then  for  every  permutation  invariant  2-stage  selection  procedure 
P  which  employs  R  at  stage  1,  we  have 

(4)  P(0,P(R))<P{9,P),  0  E  (0, oo)fc, 

where  P(0,  P)  denotes  the  risk,  i.e.  expected  loss,  of  P  at  0. 

Proof:  U_  and  V_,  respectively,  can  be  considered  as  two  random  vectors  which  are  gener¬ 
ated  through 


<?  9-t-r 

(5)  V,  =  £c„„  V<=  £  Ctj, 

i- 1 

where  the  C,(y’s  are  generic  random  variables  which  are  mutually  independent.  For  i  € 
{1, . . . ,  k}  and  j  €  {1, . . .  ,q  +  r},  2 0»C,,;  follows  a  chi-squared  distribution  with  2  degrees 
of  freedom,  i.e.  C,,y  has  the  following  density  on  the  positive  real  line: 

(6)  ff(x|0j)  =  \0i\exp{Oix),  x  >  0,  0,  = 

It  can  be  seen  now  that  almost  all  assumptions  which  are  made  in  Gupta  and  Miescke 
(1984)  are  fulfilled,  where  the  underlying  exponential  family  of  densities  is  of  the  form 

h(x\0)  =  c(6)exp(0x)d(x),  x  G  ,  0  €  fi  Q 


(7) 
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If  d  were  log-concave,  i.e.  if  the  exponential  family  were  strongly  unimodal,  our  proof 
would  be  completed  by  applying  Corollary  2  of  Gupta  and  Miescke  (1984).  However,  the 
function  d  in  (6)  is  the  indicator  function  of  the  positive  real  line,  which  is  obviously  not 
log-concave. 

A  careful  examination  of  the  proof  of  the  key  Lemma  2  in  Gupta  and  Miescke  (1984) 
fortunately  shows  that  if  the  family  (7)  has  the  positive  real  line  as  a  common  support, 
then  all  results  in  that  paper  remain  valid  if  every  n-fold  convolution  of  d  is  log-concave 
on  the  positive  real  line.  And  since  the  n-fold  convolution  of  the  indicator  function  of  the 
positive  real  line  at  x  >  0  is  equal  to  zn-1/(n  —  1)!,  which  is  indeed  log-concave  for  n  >  1, 
the  proof  of  this  theorem  is  completed. 

The  next  result  establishes  uniqueness  of  the  optimum  procedure  P(R),  and  some 
consequences.  Let  Dj(R)  denote  the  class  of  all  permutation  invariant  2-stage  selection 
procedures  which  employ  R  at  stage  1,  and  let  Pj  =  {P\P  G  P[(R),R  subset-size  rule}. 
For  a  moment,  let  us  also  consider  the  larger  classes  D{R)  and  P,  say,  where  the  procedures, 
including  their  subset-size  rules  at  stage  1,  are  not  necessarily  permutation  invariant,  i.e. 
symmetric  for  brevity. 

Theorem  2.  Let  L  be  any  loss  function  with  properties  as  described  in  Section  1.  Then 
the  following  holds. 

(a)  For  every  symmetric  subset-size  rule  R,  P{R)  is  the  unique  optimum  procedure  in 
P/(R)  in  the  sense  (4).  Moreover,  P[R)  is  admissible  in  P(R). 

(b)  If  there  exists  a  minimax  procedure  in  P/,  which  employs  R0,  say,  at  stage  1,  then 
P(Ro)  is  minimax  in  Pj,  and  both  procedures  are  minimax  in  P. 

(c)  The  class  {P(R)\R  subset-size  rule)  is  essentially  complete  in  Pj. 

Proof.  Let  L  be  a  loss  function  which  has  the  assumed  properties.  Let  R  be  a  symmetric 
subset-size  rule,  and  let  8  G  (0,  oo)fc,  where  not  all  of  the  0,’s  are  equal,  be  fixed.  Then 
it  can  be  shown,  as  in  Gupta  and  Miescke  (1984),  that  P{R)  is  the  unique  Bayes  rule 
in  P(R)  with  respect  to  the  symmetric  prior  which  gives  probability  mass  l/fc!  to  each 
of  the  kl  permutations  of  0=  (0j  ,...,0*).  Since  the  Bayes  risk  is  equal  to  P(8,  P(R)), 
the  first  parts  of  (a)  and  (b),  as  well  as  (c),  follow  from  Theorem  1.  The  second  part  of 
(a)  holds  since  a  unique  Bayes  rule  is  always  admissible.  The  second  part  of  (b)  follows 
from  Blackwell  and  Girshick  (1954),  sec.  8.6  and  the  fact  that  the  group  of  permutations 
is  finite.  This  completes  the  proof  of  the  theorem. 

The  last  result  in  this  section  confirms  the  intuitive  conjecture  that  sampling  of  more 
information  improves  the  optimum  procedure.  Let  P [R]  n,  m,  q ,  r)  be  the  procedure  P[R), 
which  employes  R  at  stage  1,  where  q  >  1,  r  >  1,  n  =  (nj,...,nfc),  n,  >  q,  m  = 
(m), . . .  ,mfc),  mi  +  rii  >  q  +  r,  i  =  1, . . . ,  fc,  may  now  be  variable.  Then  we  can  state  the 
following. 

Theorem  3.  Let  L  be  any  loss  function,  which  does  not  depend  on  n  and  m,  with 
properties  as  described  in  Section  1.  Then  for  every  symmetric  subset-size  rule  R,  the  risk 
of  P(R)  at  8  does  not  depend  on  n  and  m.  Let  it  be  denoted  by  p(8 ;  R,q,r),  say.  At  every 
8  G  (0,oo)fc,  where  not  all  of  the  0,  ’s  arc  equal,  it  has  the  following  properties. 

(8)  p{8;  R,q,r)  >  p(8;  R,q  +  l,r),  if  n,  >  q,  -I-  m,  >  q  +  r,  i  =  1, . . . ,  k. 
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(9)  p(£;J2,g,r)  >  p(£;i?,g,r  +  1),  if  »<  +  m»  >  g  +  r,  *  =  l,...,fc, 

(10)  p{0\R,q,r)  >  p{0]R,q  +  l,r  -  1),  if  n,-  >  g,  *  =  l,...,fc. 

Proof.  Let  L  and  R  be  given  as  stated  in  the  theorem.  P(R)  utilizes  all  of  the  relevant 
information  contained  in  the  observations  through  XJ_  and  T  at  stage  1  and  stage  2,  re¬ 
spectively.  In  view  of  Lemma  1,  it  is  seen  that  the  joint  distribution  of  U_  and  T  does  not 
depend  on  n  and  m.  Hence  the  risk  function  of  P(R)  has  the  same  property. 

Because  of  the  similarity  of  arguments,  we  give  only  a  proof  for  (10).  First  it  is 
important  to  note  that  P(R)  remains  to  be  the  unique  Bayes  procedure  with  respect  to 
any  symmetric  prior  within  the  class  D(R),  if  all  procedures  were  included  in  D(R)  which 
make  use  of  the  available  observations  of  all  of  the  7r,’s  at  stage  2,  but  which  still  restrict 
final  selections  to  those  tt.’s  which  have  been  selected  at  stage  1. 

If  now  P[R\n,m,q  +  l,r  —  1)  is  considered  to  be  based  on  all  available  failure  times 
up  to  the  [q  +  l)-th  and  the  (r  —  l)-th  failures  at  stage  1  and  stage  2,  respectively,  then 
P{R'>HtZHtqt^)  can  be  considered  to  be  based  on  the  same  observations.  The  latter  would 
just  ignore  the  k  (q  +  l)-th  failure  times  at  the  subset  selection  at  stage  1.  The  former 
is  the  unique  Bayes  procedure  with  respect  to  the  symmetric  prior  on  all  permutations  of 
any  fixed  6  €  (0,  oo)fc,  as  long  as  not  all  of  the  0j’s  are  equal.  For  every  prior  of  this  type, 
risk  and  Bayes  risk  coincide  for  each  of  the  two  procedures,  and  thus  (10)  is  seen  to  be 
true.  This  completes  the  proof  of  the  theorem. 

Remark.  If  L  would  depend  on  n  and  m,  it  would  naturally  be  non-decreasing  in  n, 
and  m,,  »  =  l,...,fc.  In  this  case,  of  course,  one  would  take  n\  =  •••  =  n*  =  q  and 
mj  =  •••  =  mic  =  r  as  the  best  allocation  of  components  to  be  tested.  In  a  more 
complicated  approach,  L  could  also  be  non-decreasing  in  the  time  until  a  final  decision  is 
made.  This  would  lead  to  an  opposite  requirement  of  sufficiently  large  n<’s  and  mi’s.  We 
shall  not  discuss  further  such  more  difficult  problems. 

The  final  topic  to  be  considered  here  is  the  choice  of  a  suitable  subset-size  rule  R 
for  the  decision  at  stage  1.  This  is  a  very  challenging  problem,  indeed.  Clearly,  there 
does  not  exist  any  JRo.  say,  such  that  P(Ro)  is  optimum  in  terms  of  the  risk,  uniformly  in 
0  6  (0, oo)*,  within  the  class  Vi.  On  the  other  hand,  in  a  Bayes-approach,  the  optimum 
choice  of  R  would  depend  heavily  on  L  and  on  the  chosen  prior.  Therefore,  it  seems  to 
be  justified  to  consider  in  more  detail  the  natural  rule  P[t),  say,  where  R  =  t  is  constant, 
t  G  {2, . . . ,  k  —  1}.  This  will  be  done  in  the  next  section. 

3.  Properties  of  the  Procedure  P(t). 

For  a  fixed  t  6  {2 ,...,&  —  1},  let  P(t)  be  the  following  2-stage  selection  procedure. 
At  stage  1,  the  t  7r,’s  are  selected  which  are  associated  with  the  t  largest  £/,’s.  Then,  at 
stage  2,  the  final  decision  is  made  in  favor  of  that  one  nj  which  is  associated  with  the 
largest  of  the  t  TV’s  from  the  jr,’s  which  have  been  selected  at  stage  1. 

A  natural  way  of  implementing  the  procedure  P(t)  is  to  employ  the  so-called  “indiffer¬ 
ence  zone  approach”,  which  is  due  to  Bechhofer  (1954).  It  allows  to  control  the  probability 
of  a  correct  selection,  i.e.  of  finding  the  best  7 r,-,  over  a  range  of  parameter  configurations, 
where  the  best  7r*  is  sufficiently  better  than  the  other  it  —  1  x^’s.  Thus  let  us  adopt  in  the 
following  the  0-1  loss  function,  which  is  0  (1)  if  the  best  7r,  is  (is  not)  finally  selected.  One 
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minus  the  risk,  i.e.  the  expected  loss,  is  then  the  probability  of  a  correct  selection,  which 
will  be  denoted  by  P»(CS|P(f)),  0  £  (0,  oo)fc,  in  the  sequel. 

Let  A  >  1  be  fixed.  For  0  £  (0, oo)fc,  let  0[ij  <  •  •  •  <  <?[*]  denote  the  ordered  values  of 
Si,. . .  ,0k-  Then  let 

(11)  n(A)  =  {m€(0,oo)fc,  A0(1,  <  t  =  2, . . . ,  k}. 

The  next  result  states  that  the  probability  of  a  correct  selection  with  procedure  P(t)  can 
be  controlled  on  fl(A).  More  precisely,  if  a  value  P*  £  (l//c,  1)  is  predetermined,  then 
values  for  q  and  r  can  be  found  such  that  the  infimum  probability  of  a  correct  selection 
with  procedure  P(t)  is  at  least  P*  if  0  is  restricted  to  fi(A).  It  will  also  be  shown  that  the 
parameter  configuration,  at  which  the  infimum  occurs,  i.e.  the  least  favorable  configuration 
(LFC),  is  of  the  “slippage”-type. 

Theorem  4.  For  every  t  £  {2, . . . ,  k  -  1},  and  A  >  1, 

(12)  Inf{P$(CS\P(t))\0  £  fl(A)}  =  P'(CS\P(t)), 
where  e  =  (1,  A,  A, . . . ,  A)  with  k  coordinates. 

Proof.  The  probability  of  a  correct  selection  for  the  procedure  P(t)  at  0  £  fi(A)  with, 
say,  =fl[i],  has  the  form 

(13)  Pe(CS\P[t))  =  J2  PdUt  <  Ui>  Uj  +  Vj  <Ul+Vuj£  5}, 

iC{2,...,fc} 

1*1=*—  1 

where  here  in  the  sequel,  s  =  Su{l},  if  both  s  and  s  appear  simultaneously  in  an  expression. 

Let  By  —  0yV„  t  =  1, . . . ,  k,  be  auxiliary  random  variables  to  be  used  in  the  following. 
It  is  easy  to  see  that  a  lower  bound  to  (13)  is  attained  if  for  j  =  2, . . . ,  k,  Vj  is  replaced 
by  Bj/AOx  in  the  events  appearing  in  (13).  Since  the  distribution  of  the  random  vector 
YL  —  (Bi/0i  —  Bi/ AOi,  Bi/Oi  —  Bz/ A0j,. . .  ,B\/0i  —  Bk/ A0i)  is  seen  to  be  permutation 
symmetric,  this  lower  bound  can  be  represented  by  an  integral  over  {o  =  (at , . . .  ,at_i)  | 
ai  <  <*2  <  •  •  •  at-i}>  where  the  integrand  is  a  product  of  the  joint  density  of  the  first  t  —  1 
coordinates  of  W_  at  a  and  the  following  function  of  a. 


V:  P${Ui  <  Uy,  l&  s,  i  £  s; 


2<«i«j<  <»,_,<*  a 

«={*!  .«!. •••.««-  1  } 


Uyj  <U\  +  a<,(j),  j  =  1, . . . ,  t 


where  in  the  second  summation,  o  runs  over  all  (t  —  1)!  permutations  of  (1,2, ...  ,t  —  1). 
To  show  now  that  (14)  is  nondecreasing  in  02,..., 0k,  it  suffices  to  prove  it  for  0k ■  To  do 
so,  we  first  replace  in  (14)  all  probabilities  by  the  corresponding  conditional  probabilities, 
given  Ux  -  ylt...,C/fc_ j  =  lf  where  we  may  assume  without  loss  of  generality  that 
1/2  <  J/3  <  •  •  •  <  yfc-i  holds.  Let  6y  =  yi  +  o;,  j  =  1,  ...,<-  1.  Then  we  get 

Y,Pl{Uk,Uk-t  <  uk  -t+ljUi]  Uk-t  +  l  <  •  •  • , 


Uk-  1  <  1)1^1  —  Ul,  ■  ■  •  ,Uk-l  =  t/fc—  1 } 


r  W 
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+  ^  -P ?{£4-<+l  <  Uk,Ui\  Uk  <  £><r(l)»  £4-t  +  2  <  £><7(2)>  •  •  •  , 

a 

f4-i  <  £><r(t-i)|£4  =  Hi,- •  ■  ,Uk-i  =  j/fc-i}. 

In  case  of  j t+i  <  j/i ,  this  reduces  to 

'£iPe{Uk,Uk  -1+1  <  £><7(1)1  Uk-t+2  <  £><7(2)1  •  •  ■  * 

<7 

(16)  £4-i  <  £><r(t-i)|£4  =  yii ■  •  ■  1  £4-i  =  j/it—  1  }i 

whereas  in  case  of  yi  <  yk-t+i,  it  reduces  to 

'£,pe{Uk,Uk-t<Ul-,  Uk  -i+ 1  < 

<7 

(17)  Uk- 1  <  £><7(t-i)|££i  =  yi,-“,Uk~i  =  yfc-i}- 

Since  now  both,  (16)  and  ( 17) ,  are  seen  to  be  nondecreasing  in  Ok,  the  proof  of  the  theorem 
is  completed  by  noting  that  Pr$(C  S\P  (t)),  r  >  0,  does  not  depend  on  r. 

It  should  be  noted  that  Theorem  4  holds  also  for  t  =  1  and  for  t  =  k.  But  P(  1)  and 
P(k)  are  actually  1-stage  selection  procedures.  Procedures  of  this  type  have  been  studied 
extensively  in  the  past,  and  an  overview  of  the  literature  in  this  respect  can  be  found  in 
Gupta  and  Panchapakesan  (1979). 

A  very  natural  and  interesting  question  is  now  to  find  sufficient  conditions  under  which 
a  2-stage  procedure  of  the  type  P(t)  performs  better  than  a  1-stage  procedure.  This  could 
be  done,  for  example,  on  the  basis  of  a  common  total  number  of  failures.  To  be  more 
specific,  let  us  assume  that  there  exists  an  integer  d,  >  2,  say,  such  that  k  =  dt.  Thus,  if 
P{t)  is  based  on  q  failures  at  stage  1  and  on  r  =  dq  failures  at  stage  2,  respectively,  then 
the  total  number  of  failures  becomes  k(q)  +t(dq)  =  k[2q),  and  P(t)  can  be  compared  with 
the  optimum  1-stage  procedure,  Pu  say,  which  is  based  on  2 q  failures  from  7Ti, . . . ,  tt*. 

If  the  k  -  t  largest  Of s  tend  to  large  values  compared  to  the  t  smallest  0,’ s,  then  it  is 
not  difficult  to  see  that  the  P(CS)  of  P(t)  will  be  larger  than  the  P{CS)  of  P\.  However, 
at  other  parameter  configurations,  Pi  may  be  the  better  procedure.  It  appears  thus  to  be 
more  promising  to  compare  the  infima  of  probabilities  of  correct  selection  on  fl(A),  which 
are  both  attained  at  0  =  (1,  A, ... ,  A).  The  answer  to  the  stated  question  would  then  of 
course  depend  on  A.  No  results  in  this  respect  are  known. 

4.  Some  Extensions. 

A  natural  extension  of  the  topics  of  the  previous  sections  is  to  consider  p-stage  selection 
procedures  for  p  >  2,  which  are  based  on  q ,  failures  at  stage  i,  i  =  1  where  the 

selected  subsets  at  consecutive  stages  are  nested.  For  permutation  invariant  subset  size 
rules  Ri  >  P2  >  >  Rp-i,  let  P(Ri, . . . ,  Rp- 1)  denote  the  procedure  which  selects 

at  stage  i  in  terms  of  the  Rt -largest  sufficient  statistics  for  stages  1  through  i  combined, 
1  =  l,...,p,  where  Rp  =  1.  As  in  Gupta  and  Miescke  (1984),  if  the  loss  function  is 
generalized  accordingly,  it  can  be  shown  that  all  final  decision  rules,  including  those  which 
are  made  whenever  an  R,  turns  out  to  be  1,  as  well  as  the  subset  selection  at  stage  p  —  1, 
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are  optimum  in  an  analogous  way  to  (4).  And  it  can  be  shown  that  P{R\ , . . . ,  Rp-i)  is 
the  unique  Bayes  procedure  with  respect  to  every  i.i.d.  prior  among  all  procedures  which 
employ  Rx,. . . ,  Rp-\,  if  the  loss  at  stage  i  depends  only  on  the  parameters  of  the  actually 
selected  populations  at  this  stage.  In  this  case,  the  procedure  turns  out  to  be  admissible 
in  P(i?i, . . . ,  Rp-i),  the  natural  generalization  of  D(R). 

The  situation  becomes  more  favorable  if  R\  =  =  tp_j  are  fixed,  where 

of  course  *i  >  t?  >  >  fp_i  holds.  The  procedure  P(ti, . . .  can  be  shown  to 

be  the  unique  Bayes  procedure,  with  respect  to  every  symmetric  prior,  within  the  class 
D(t . .  ,tp_i).  Thus  it  is  also,  uniformly  in  0,  optimum  p-stage  procedure  in  an  analogous 
way  to  (4)  within  the  class  Di{t i, . . .  ,£p_i).  The  proof  of  these  facts  is  essentially  the  same 
as  in  Gupta  and  Miescke  (1984),  where  one  has  only  to  take  care  of  the  slight  technical 
modification  concerning  the  function  d  in  (7),  which  has  been  discussed  at  the  end  of  the 
proof  of  Theorem  1.  One  problem,  however,  remains  open:  The  least  favorable  parameter 
configuration  in  fi(A)  under  a  0-1  loss,  or  more  specifically,  for  the  probability  of  finally 
selecting  the  best  population,  is  not  known  for  p  >  2. 

Finally,  some  comments  about  a ,  the  shape  parameter  of  the  underlying  Weibull 
family,  have  to  be  made.  We  could  have  allowed  from  the  very  beginning  that  7ri,...,Jr* 
have  known  shape  parameters  ai, . . . ,  a*,  which  are  not  necessarily  identical.  If  C/,-,  V,,  and 
T,  were  defined  as  in  (2),  but  now  with  a,  instead  of  a,  t  =  1, . . . , k,  then  all  subsequent 
results  would  still  be  true  because  of  the  facts  stated  in  Lemma  1.  However,  we  did  not 
follow  this  idea  since  the  statistical  relevance  of  selecting  the  population  with  the  smallest 
9,  would  become  rather  questionable. 

A  more  interesting  problem  would  be  the  following.  Suppose  that  ?r i , . . . ,  7r*  have  a 
common  shape  parameter  a,  which  is  unknown.  Great  difficulties  arise  in  this  situation, 
mainly  because  a  reduction  by  sufficiency,  as  it  was  done  before  quite  successfully  in  (2), 
is  no  longer  possible  here.  The  maximum  likelihood  approach,  as  it  was  utilized  in  the  two 
papers  of  Kingston  and  Patel  (1980),  would  lead  in  V[t)  to  a  procedure  which  is  almost 
identical  with  P(t).  The  only  difference  is  that  at  stage  1,  a  is  replaced  by  the  maximum 
likelihood  estimator  di,  and  at  stage  2,  a  in  U,  and  V,-,  i  =  1  ,...,fc,  is  replaced  by  the 
overall  maximum  likelihood  estimator  0:2,  which  is  based  on  all  observed  failure  times.  It 
is  not  known  how  good  this  procedure  actually  is. 

If  a  bound  a.  <  a  (or  a*  >  a)  were  known,  this  would  not  be  of  any  help.  Since  if  we 
used  P(t)  with  a.  in  (2)  as  a  substitute  for  the  unknown  a,  the  resulting  procedure  would 
perform  at  the  actual  a  worse  than  if  a ,  were  L.e  right  shape  parameter.  This  is  a  direct 
consequence  of  Theorem  1.  Therefore,  a  bound  of  a  is  of  no  use  if  we  wish  to  control  the 
probability  of  a  correct  selection  from  below. 

The  most  promising  problem,  however,  appears  to  be  the  search  for  conditions  under 
which  a  2-stage  procedure  P(t)  is  preferable  to  a  1-stage  procedure  P\  which  employs  the 
same  total  number  of  failures.  It  has  been  formulated  at  the  end  of  Section  3. 
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